Fixes Issue #328 #329

RupertAvery · 2023-04-29T13:21:51Z

Fixes #328

Parse the keyword and text directory using utf8 encoding if PNG Chunk Type is iTXt

drewnoakes

Thanks, this looks great. I'll try it on the regression test data set before merging.

MetadataExtractor/Formats/Png/PngMetadataReader.cs

drewnoakes · 2023-05-01T01:56:56Z

MetadataExtractor/Formats/Png/PngMetadataReader.cs

@@ -248,7 +249,7 @@ private static IEnumerable<Directory> ProcessChunk(PngChunk chunk)
            else if (chunkType == PngChunkType.iTXt)
            {
                var reader = new SequentialByteArrayReader(bytes);
-                var keyword = reader.GetNullTerminatedStringValue(maxLengthBytes: 79).ToString(_latin1Encoding);
+                var keyword = reader.GetNullTerminatedStringValue(maxLengthBytes: 79).ToString(_utf8Encoding);


There's actually now an issue slightly below here. The bytesLeft value was based on the length of the string in bytes, which for latin1 is the same as the length of the string in characters. With UTF-8 that's not the case. I'll patch this up and push to your PR.

drewnoakes · 2023-05-01T02:00:50Z

I ran this against the regression test data set and it now successfully parses a bunch of previously broken values. I see no downside to this anywhere. Great stuff, thanks!

This is a port of a fix from the .NET library in drewnoakes/metadata-extractor-dotnet#329 PNG chunks of type `iTXt` should have keywords and values decoded using UTF-8, not Latin1 encoding.

@RupertAvery

Thanks to @RupertAvery for reporting the issue in drewnoakes/metadata-extractor-dotnet#328 and providing a fix in drewnoakes/metadata-extractor-dotnet#329 Ported to Java in drewnoakes/metadata-extractor#611

Fixes Issue drewnoakes#328

d2531f8

Parse the keyword and text directory using utf8 encoding if PNG Chunk Type is iTXt

drewnoakes reviewed May 1, 2023

View reviewed changes

MetadataExtractor/Formats/Png/PngMetadataReader.cs Outdated Show resolved Hide resolved

drewnoakes reviewed May 1, 2023

View reviewed changes

drewnoakes added 3 commits May 1, 2023 12:04

Use built-in UTF8 encoding field

7c097e1

Fix byte length computation and add comment

efafc95

Merge remote-tracking branch 'origin/master' into fix-328

02750ea

drewnoakes mentioned this pull request May 1, 2023

Fix PNG iTXt encoding issue drewnoakes/metadata-extractor#611

Merged

drewnoakes added bug format-png labels May 1, 2023

drewnoakes approved these changes May 1, 2023

View reviewed changes

drewnoakes merged commit 7da96dd into drewnoakes:master May 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes Issue #328 #329

Fixes Issue #328 #329

Uh oh!

RupertAvery commented Apr 29, 2023 •

edited by drewnoakes

Loading

Uh oh!

drewnoakes left a comment

Uh oh!

Uh oh!

drewnoakes May 1, 2023

Uh oh!

drewnoakes commented May 1, 2023 •

edited

Loading

Uh oh!

Uh oh!

Fixes Issue #328 #329

Fixes Issue #328 #329

Uh oh!

Conversation

RupertAvery commented Apr 29, 2023 • edited by drewnoakes Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drewnoakes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

drewnoakes May 1, 2023

Choose a reason for hiding this comment

Uh oh!

drewnoakes commented May 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

RupertAvery commented Apr 29, 2023 •

edited by drewnoakes

Loading

drewnoakes commented May 1, 2023 •

edited

Loading